feat: extend postgres vector store with paradedb BM25 support #20024

Restodecoca · 2025-10-04T22:36:53Z

Description

This PR introduces the ParadeDB vector store integration, extending the PostgreSQL-based store to support BM25 + vector search using ParadeDB.

This implementation is based on llama-index-vector-stores-postgres, and has been refactored to inherit directly from PGVectorStore, reducing duplicated logic and ensuring full compatibility with the PostgreSQL backend.

It also supports for custom query execution, enabling advanced hybrid retrieval use cases through ParadeDB’s enhanced search engine.

This PR:

Adds full ParadeDB compatibility (schema, extensions, and BM25 index creation).
Supports hybrid dense + sparse retrieval (BM25 + embeddings) via ParadeDB’s pg_search extension.
Refactors the ParadeDB store to extend PGVectorStore while overriding BM25-specific behavior.
Reintroduces custom query support, improving flexibility for advanced search operations.
Includes a new README.md with instalation.

Fixes

Fixes #
(or leave blank if this is a new feature without a linked issue)

New Package?

Yes — llama-index-vector-stores-paradedb
No

A detailed README.md was added with usage examples, setup instructions, and integration notes.
The tool.poetry.dependencies.llama-index-vector-stores-postgres reference is also declared in pyproject.toml.

Version Bump

Yes — bumped version to 0.1.0
No

Type of Change

New feature (non-breaking change that adds functionality)
Bug fix (non-breaking change which fixes an issue)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update

How Has This Been Tested?

Added new unit tests to cover BM25 ranking and hybrid retrieval
Verified compatibility with PostgreSQL and ParadeDB backends
Compared ts_vector vs BM25 ranking results
Confirmed all existing tests (pytest) pass successfully
Additional manual validation in Dockerized environments

Example Output

--- TSVECTOR RESULTS ---
Rank 1: ID=ccc, Score=0.06079
Rank 2: ID=ddd, Score=0.06079

--- BM25 RESULTS ---
Rank 1: ID=ddd, Score=0.67853
Rank 2: ID=ccc, Score=0.50741

Suggested Checklist

I have performed a self-review of my own code
I have made corresponding changes to the documentation (README.md)
I have commented my code, particularly in hard-to-understand areas
My changes generate no new warnings
I have added Google Colab support for the newly added notebooks
I have added tests that prove my feature works
New and existing unit tests pass locally with my changes
I ran uv run ruff check --fix . and uv run ruff format . to appease the lint gods

Summary

ParadeDB now fully inherits from PGVectorStore, providing a cleaner, more maintainable implementation.
BM25 and hybrid retrieval are both supported natively.
Custom query execution is re-enabled, restoring flexibility for advanced retrieval logic.
2K lines reduce while retaining full test coverage.

llama-index-integrations/vector_stores/llama-index-vector-store-paradedb/README.md

logan-markewich

Did this really need to copy paste 1K lines from the postgres integration vs. just subclassing it?

Restodecoca · 2025-10-07T17:21:10Z

Did this really need to copy past 1K lines from the postgres integration vs. just subclassing it?

Yeah, i think you're right, i'm gonna redo it by subclassing pgvector to simplify, thanks for the feedback

…rch support

dosubot bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Oct 4, 2025

logan-markewich reviewed Oct 6, 2025

View reviewed changes

llama-index-integrations/vector_stores/llama-index-vector-store-paradedb/README.md Show resolved Hide resolved

logan-markewich reviewed Oct 6, 2025

View reviewed changes

Restodecoca added 10 commits October 10, 2025 18:29

chore: initial project setup and configuration files

9a63b0f

feat: add ParadeDBVectorStore implementation with BM25 and hybrid sea…

b853500

…rch support

docs: add README with ParadeDB setup and usage instructions

29698f0

test: add initial pytest coverage for ParadeDB hybrid search

71bc014

refactor: inherit from PGVectorStore and integrate BM25 support

3d303b4

test: simplify tests to focus only on BM25 functions

9b7aa93

doc: add instalation guide and scores from BM25 x TSVector

6cb1b92

doc: add instalation guide and scores from BM25 x TSVector

a9111dd

fix(deps): missing postgres vector store dependency in pyproject.toml

3f91652

fix: run linter and format

1e77573

Restodecoca force-pushed the paradedb branch from 80f3f9e to 1e77573 Compare October 10, 2025 21:29

Restodecoca requested a review from logan-markewich October 10, 2025 23:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: extend postgres vector store with paradedb BM25 support #20024

feat: extend postgres vector store with paradedb BM25 support #20024

Restodecoca commented Oct 4, 2025 •

edited

Loading

Uh oh!

Uh oh!

logan-markewich left a comment •

edited

Loading

Uh oh!

Restodecoca commented Oct 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: extend postgres vector store with paradedb BM25 support #20024

Are you sure you want to change the base?

feat: extend postgres vector store with paradedb BM25 support #20024

Conversation

Restodecoca commented Oct 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Fixes

New Package?

Version Bump

Type of Change

How Has This Been Tested?

Example Output

Suggested Checklist

Summary

Uh oh!

Uh oh!

logan-markewich left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Restodecoca commented Oct 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Restodecoca commented Oct 4, 2025 •

edited

Loading

logan-markewich left a comment •

edited

Loading